A generalized LR parser for text-to-speech synthesis
نویسنده
چکیده
The development of a parser for a Norwegian text-to-speech system is reported. The Generalized Left Right (GLR) algorithm [1] is applied, which is a generalization of the well known LR algorithm for parsing computer languages. This paper describes briefly the GLR algorithm, the integration of a probabilistic scoring model, our implementation of the parser in C++, attribute structures, lexical interface, and the application of the parser to part-of-speech (POS) tagging for Norwegian. Applied to a small test set of about 4 000 words this method correctly tags 96 % of the known words, which is close to the performance of other POS-taggers trained on large text databases [2] [3]. 85 % of the unknown words are tagged correctly, and the probability of choosing the wrong pronunciation of a word from lexicon is less than 0.1 %.
منابع مشابه
An Experimental Real - Time Speech - to - Speech Translation System *
This paper reports the current progress in the SPEECHTRANS project at the Center for Machine Translation which is a speech-to-speech translation project for real-time processing of speaker-independent noisy continuous speech input. SPEECHTRANS uses a custom speech recognition hardware and a phoneme-based generalized LR parser that uses a unification-based grammar formalism and a natural languag...
متن کاملRobust Parsing of Noise Contaminated and Extra-grammatical Input: a Grammar Focused Approach
This thesis tackles the problem of parsing noise contaminated input by identifying and parsing the maximal subset of the input string that is found to be grammatical. I develop a parser that is based on the Generalized LR Parsing paradigm and performs this task eeciently. Since the parser uses the grammar to identify the meaningful words of the input, it can be viewed as a focusing tool. The pa...
متن کاملBilingual aligned corpora for speech to speech translation for Spanish, English and Catalan
In the framework of the EU-funded Project LC-STAR, a set of Language Resources (LR) for all the Speech to Speech Translation components (Speech recognition, Machine Translation and Speech Synthesis) was developed. This paper deals with the development of bilingual corpora in Spanish, US English and Catalan. The corpora were obtained from spontaneous dialogues in one of these three languages whi...
متن کاملEmpirical Support for Probabilistic GLR Parsing
This paper discusses the e ectiveness of a new probabilistic generalized LR model (PGLR) in word-based parsing (morphological and syntactic analysis) tasks, in which we have to consider the word segmentation and multiple part-of-speech problems. Parsing a sentence from the morphological level makes the task much more complex because of the increase of parse ambiguity stemming from word segmenta...
متن کاملConnectionist and Symbolic Processing in Speech-to-Speech Translation: The JANUS System
We present JANUS, a speech-to-speech translation system that utilizes diverse processing strategies including connectionist learning, traditional AI knowledge representation approaches, dynamic programming, and stochastic techniques. JANUS translates continuously spoken English utterances into Japanese and German speech utterances. The overall system performance on a corpus of conference regist...
متن کامل